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(54) A design environment and a design method for hardware/software co-design 



(57) A design environment and a design method tor 
implementing a heterogeneous essentially digital sys- 
tem is disclosed. Ttie design environment comprises: 

a database compiled on a computer, adapted tor ac- 
cess by executable programs on said computer lor gen- 
erating the implementation of said heterogeneous es- 
sentially digital system, comprising a plurality ol objocts 
representing aspects of said digital system wherein said 
objects comprise primitive objects representing the 
specification df said dgital system and hlsiarchtcal ob- 
jects being created by said executable programs while 
generating the implementation of said digital system, 
said hierarchical objects being retinemenls ot said prinv 
Itlve objects and having more detail and preserving any 
one or all of said aspects to thereby generate said im- 
plementation of said di^lal system; and further compris- 
ing relations inbetween said primitive objects and inbe- 
twsen said hierarchical objects and between said primi- 
tlwe objects and said hierarchical objects; and further 
conr^rising functions for manipulating said objects and 
said relations; means for Specifying said heterogeneous 
digital system comprising a plurality of behavioral end 
stnjctural languages; means tor sirrwlating said hetsro- 
geneoL» digital sysiem comprising a plurality of simula- 
tors for said behavioral and structural lariguages; means 
for implementing said heterogeneous digital system 
comprising a plurality of compilers for said behavioral 
and structural languages: means fordllocattng hardware 
components for an implementation of said heterogene- 



ous digital system; means for assigning hardware sub- 
systems and software subsystems of said heterogene- 
ous digital system to said hardware components; means 
for implementing the communication between said soft- 
ware subsystems and said hardware subsystems, one 
o( the aspects of said communication being represented 
by ports; means for erK^apsuIaiing said simulators, said 
compilers, said hardware components, said hardware 
subsystems and said software subsystems whereby cre- 
ating a consistent communication between said encap- 
sulated simulators, compilers, hardware components, 
ttardware subsystems arid software subsystems, 

f^urthcrmore, a method Is disclosed for making an 
implementation of an heterogeneous essentially digita! 
system, comprising the steps of: defining a first set of 
primitive objects representing the specification of said 
cSgital system, conprising the steps of: describing the 
specification of said system in a plurality ot processes, 
each process representing a functk^nal aspect of said 
system, said processes being primitive objects; defining 
ports and connecting said ports with channels, said 
ports structuring the communication between said proc- 
esses, said ports arKi said channels being primitive Ob- 
jects, one proces s having one or mors pons: defining 
the communication serrksntics of said ports by a proto- 
col, said protocol being a primitive object; and thereafter 
creating hieraichical objects being ref inenwnts of said 
primitivB objects and having mare detail, while preserv- 
ing aspects of said communication semantics. 
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De&cript'»on 

Field at the invention 

The present invention retalee to a design environ- 
ment and a design method tor hardware/software co- 
design. More specincally the hardware/soflvvare co-de- 
sign of this invention comprisss the specification, syn- 
thesis, and simulation of heterogeneous systems. 

Bflckgrourtd of the invention 

Digital connmunicalion techniques form the l>asis of 
the rapid breaKlhfOugh of modem consumer electron- 
ics, wireless and wired voice- and data networking pfoo- 
ucls. broadband networks and multi-mediaappltcations. 
Such products are t>as©d on digital communication sys- 
tems, wnicn are made possible by the combinairan ot 
VLSI technology and Dlgiial Signal Processing. 

Djgiial systems perform real-lime transformations 
on time disci etc digitized samples of analogue quanti- 
ties with Imde bandwidth ©nd signal to noise ratio. These 
iranslprmiiiions can be specified in programming lan- 
guages ond executed on a programmable processor or 
dirodiy on application specific hardware. The choice 15 
determinQd by trade-offs between cost. perfomnancG. 
power and flexibility 

Hence dignal systems are a candidate par excellence 
for hardwate-fioflware cO-design. 

In contrast to anatogue processing, digital process- 
ing guarantees perfect reproducibility, storage and test- 
ability Signal quality is a matter of exact mathematical 
operations. The price paid is the cost of hardware and 
the performance needed to satisfy the hard real-time 
character. This problem is now soh/ed by the abundance 
ot digital VLSI (Very Large Scale Integration) technology 
which provides for cheap storage and high speed conrv 
putation. Therefore, the combination of VLSI technology 
and digital processing has made possible the break- 
through of modem consumer electronics, portable and 
personal communication, broadband networks, multi- 
media, and automotive applicaikxis. 

The design process of the products for these appli- 
cations te subject to a number of constraints. A first con- 
straint is ihai ihey must be implemented in silicon or an- 
other hardware platform for power, performance and 
cost reasons. A second constraint istnat these products 
implement systems conceived by a highly specialized 
system team thinking in terms of executable concurrent 
programming paradigms which, today, are not well un- 
derstood by hardware designers. Hence most specHica- 
tions are first translated into English arxS then rede- 
signed in a specific hardware description language such 
as VHDL or VERILOG for the hardware components 
and a software description language such as C or as- 
sembler tor the software components. Although the 
hardware and software have tight interaction, both hard- 
ware arK) software are designed separately. Only after 



system assembly, the software and hardware are run 
together. As a consequence, the design can be fart rom 
optimal or even erroneous, making a redesi^ cyd© 
mandatory. This gap between system design and 'ur\p\B- 
5 mentation is rapidly becoming the most irr^rtant bot- 
tleneck in the design process of said products or said 
systems. Another constraint is tnat for reasons of cost- 
effectiveness and time-to-market, there is a need to in- 
crease design productivity by at least an order of mag- 
70 nrtude. Yet another constraint is that re-use of designs 
as well as a design for re-use nrtethodokagy will have to 
be adopted. Said methodology implies hardware/soft- 
ware co-design at several levels of implementation. 
J. Buck et al. in 'PTOLEMY: A framework for simu- 
rs latlng and prototyping heterogeneous systems" (Inter- 
national Journal on Computer Simulation, January 
1994) focus on an environment for hardware/software 
co-simuiatlon. The proposed methodotogy does only al- 
low for hardware/software co-design of systems based 
20 on a DaiS'Fksw algorithm. Furthermore hardware/soft- 
ware interface synthesis e not supported. 

United States Patent No. 5.197:016 discloses a 
computer-aided system and method for designing an 
application specific integrated circuit whose intended 
^5 function is implemented tjoth by hardware subsystems 
and software subsystems. The proposed methodology 
only altows for a single processor design arwJ is only val- 
id for specifications based on a state transition diagram. 
The hardwareysoftwarc co^esign of systems based on 
So a heterogeneous specification is rx>t supported 

S. Narayan. F. Vahkl, and D. Gajski. in 'System 
specificatten with the SpecCharts language" (IEEE De- 
sign & Test of Computers, pages 6 - 13, December 
1992) disclose a methodotogy that buikJs on VHDL. The 
55 methodology does no\ support the hardware/softvvare 
co-design of systems based on a heterogeneous spec- 
ification. 

R Chou, R. Ortega, and Bomello. in 'Synthesis 
of the hardware/software interface in microcontroller* 
40 based systems" (Proceedings of the IEEE International 
Conference on Computer-Aided Design, iCCAD 92, 
pages 438 - 49S, November 1 992) show a method for 
hardware/software interface generation for microcon- 
troller based systems. Said method assumes that the 
45 yser determines the sofiware interfacing j Jh as the 
communication with drivers before the start of the sys- 
tem synthesis task 

Neither of the prior art solutions provides a design 
environment based on a data^odel that allows to spec- 
ie ify, simulate and intplement or synthesize heterogene- 
ous hardware/software implementatk>ns starting f nom a 
heterogeneous sy^em specificarion. In the following 
paragraphs of this section an analysts is made of the 
Characteristics of specifications of such heterogeneous 
s$ systems. 
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Problem Definition 

In the strictest sense digital systems are afgorithms 
mapping digital signals into digital signals in real-lime. 
The reaMime constraint te detemiined by the repetition 5 
period ot the algorithm tor consuming an input Irame and 
producing a new output trame. 

The periodicrty of this constraint and the nature of 
the signals leads to the fact that the elementary algo- 
rithm is a data-flow function. 

A Synchronous Oata-Flow (SDF) algorithm can be 
modeled as an acyclic graph where nodes are operators 
and edges represent data-preceder«:es. This graph is 
called a data-flow graph. An operator can execute whan 
a predetermined, fixed number of data (tokens) are ^5 
present at its input and then it produces one or more 
output data (loKens). 

Conditional selection of two operations loa single output 
Is altoweo. Operators have no state but arrays resulting 
from one algorithm execution can be saved for reuse in 
future executions (delay operator). IWany digital 
processing algorithms are of this type. They can be de- 
scribed very efficiently by so called appficalive program- 
ming languages Uke SILAGE. 

In contrast, dynamic data-flow (DDF) algorithms ^ 
contain data-dependent tokon production and con- 
sumption. They allow for while and if-then-elee con- 
strucis. 

Computer-Aided Design (CAD) environments for 
digital systems such as DSP-Station of Mentor Graph- 50 
ICS. PTOLEMY. GRAPE-ll of COSSAP at) allowfor ^ec- 
ificatlon of SDF and DDF and use as much as possible 
static scheduling to provide simoteiliort speeds that are 
up to two orders o1 rr^gnrtude taster than event driven 
simulators such as in use for VHDL. This justifies the 35 
use of these simulation paradigms for digital system 
specification and vafidaiion. 

However, when we consider digital processing sys- 
tems In the broad sense, a wider scope is necessary as 
illustrated In FIGURE 1 which is an abstraction of nnany <o 
practical implementations of digital processing systems. 
A careful took at FIGURE 1 allows us to identify five (in 
the sequel 1 ) 2) 3) 4) 5) ) common characteristics of dig- 
ital processing system specifications: 

4S 

^) Digital systems typically con^prisc one (or more) 
signal paths 1 as wall as slow control loops 2 and a 
reactive control system 3 taking events 4 of a slow 
environment such as a user interface (Ul) 5 and 
slow status information 6 of the signal paths as in- so 
puts to control the mode or parameters of the signal 
paths. 

2) A signal peth 1 is usually a concaterialion of data- 
flow tunctiorral (DFB) t^lockc 7, such as hi , h2 

L2, often operating at fairly different data- and ex©- 
cutior>-rateS and transforming the fornnat of the da- 
la. The late arK) format diflerences naturally resuft 
ffom operations such as: frequency down- or up- 



convcrsion, bit to symbol modulation, data-com- 
pression arvd error correction coding. When these 
DFBs operate on unf ragmented sigr>al words, they 
can best be specified as data-flow algorithms (e.g. 
in SILAGE, DFL, or C). Others that rT\anipu1ale in- 
dividual bits of the signals can be directly specified 
as Finite Slate (wtechines with Data paths (FSr/D) 
at VHDL register transfer or behavioral level. Hence 
the specification format depends on the type of da- 
ta-flow functional block- 

3) DFBs ffi the Signal path are internally strongly in- 
terconnected data-flow graphs with sparse external 
communication. Hence, from an implementation 
viewpoint, they are seldom partitioned over several 
hardware or software components. Rathcrthey will 
be merged onto the same component if throughput 
and rate constraints allow to do sa Merging implies 
sequentlaltzing the concurrent processes on a ©In- 
gle component while still satisfying the timing con- 
sirainls. This requires software synthesis ; encap- 
sulation techniques of single thread compilers in or- 
der to allow real-time scheduling of concurrent proc- 
esses. 

4) Control loops and mode control by parameter 
setting arc common to abnost all digital processing 
systems. For example, all digital corrtmunication 
systems have tracking and acquisition toops in or- 
der to synchronise frequency and phase of the re- 
ceiver signal path to the characteristics of the in- 
coming signal. Design of these loops is one of the 
rnost difficult tasks since their characteristics de- 
pend strongly on noise and distortion properties of 
the communication channel. It invokes the design 
of phase-locked kx>ps. delay-locked loops, and fast 
Fourier transforms, controlled by •'events' tJisiurt>- 
ing the regularity of the signal streams. 

The occurrence rat© ot these events is orders 
• of magnitude slower than the data-rate in the signal 
path. Hence» similar to the Ul, the processes nr*od- 
eling these slow contnsl loops and mode setting 
have no data-flow but reactive semantics. 

They run concurrently with the data-flow and of- 
ten consist themselves of concurrent processes. 
Such a control dominated system can be descn'bed 
as a Program State Machine (PSM), which is a hi- 
erarchy 01 program-states, in Which each program- 
state represents a distinct mode of computation. 
Formalisms such as StaieCharts or SpecCharls . 
which include behavioral hierarchy, exception harv 
dling and inter-process communication modeling 
are needed to descrit^ such systems. In practice, 
very often synchronization is specified tn one or 
more concurrent C prograrrte. 
5) Digital systems contain both high ar>d low data- 
rate blocks in the sigr^l path. High data-rate bkx:ks 
are synthesized directly in hardware. Low data-rate 
bkx:k5 are candidates for implementation on pro- 
grammable processors. Hence digital systems are 
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natural canctidales for hardwarG/sollwar© co-de- 
sign. 

From the above it foltows that cJigiial systems re- 
quire a combination of data-models for their specifica- 
tion. Specification languages are tightly coupled to 
these data-models, paradigms, simulators, and synthe- 
sis tools. 

Nowadays, the dominant speciTication language of 
the digital system designer is C or a DFL (or the main 
signal path whereas FSMDs arKj PSMs are usually de- 
scribed in a HDi_ For the description of communication 
channels and communication protocols other formal- 
Isms such as timing diagrams. Extended Signal Transh 
tion Graphs, and Communicating Secjuential Processes 
must be considered. A CAD system for digital systems 
must be abte to encapsulate all these paradigms and 
tr^ere associated languages and design environments. 

Digital systems design thus requires the abiliry to 
mix data-Dow and reaclive paradigms with widely differ- 
ent time constants. The difference in time constants be- 
tween cootrol-and data-flow poses special problems in 
simulation. It requires all processes to be simulaiable at 
the highest possible abstraction level. 

Not only the specification of a digital system is het- 
erogeneous by nature. Also the implementation archi- 
tecture of a digital system is heterogeneous. An exam- 
ple implementation architecture comprises theloUowing 
types of components and the communication between 
these components: 

• programmable processors. 

» application specific processors with hardwired con- 
troller. 

• application specific processors with specialized in- 
struction set. 

hardware accelerators 

• micro controllers 

• communication blocks and memory 

• peripherals (DMA. UART. 

Thus, a design method for a digital system must 
bridge the gap between the heterogeneous specifica- 
tloo of the system and Its heterogeneous Implenrtenta- 
tion. Today's synthesis tools and compilers allow us to 
synthesize or program all the proccssor-accelerator- 
memory components once the global system architec- 
ture has been defined- However, the availability of these 
component compilers is necessary, but not sufficient. 
What is needed are the rrodels and tools to refine the 
functional specification of a system into the detailed ar- 
chitecture: the definition and allocaiion of the compo- 
nents and their communication arid synchronization. 
The most essential stop is to generate the necessary 
software and hardware to make processors, accelora- 
tofs, and the envrronment communicate. 

One of the keys to mastering the comptexity of dig- 
ital system design is the reuse of components. The de- 



sign process lor a digital system must allow the mode- 
ling of reusable components and support a design for 
reuse methodology which allows to design -components 
that are easily reusable. The problem in reusing previ- 
5 ousiy designed components lies in the fixed communi- 
cation protocols they use. which necessitates protocol 
conversions when processors with different protocols 
have to be interfaced. Nowadays, the selection of a pro- 
tocol is done whfle designing the component functional 
JO and communication behavior are intrinsically mixed. 
However, a good selection of the protocol is possible, 
only when all components involved in the communica- 
tion are known. Therefore, a design environment for dig- 
ital systems has to allow that a component is initially de- 
1^ scribed purely functional. Later, when the cpmponent is 
(re)used in a system, the design environment must allow 
to plug the most appropriate communication oehavbr. 
This approach is in contrast with current hardware (VH- 
DL) design practices, where communtcaiion and lunc- 
20 lional behavior are mixed. 

Another key to mastering the complexity of digital 
system design is by means Of nnodularity. !n modular de- 
signs^ ihe complete syfitem functionality is split into 
communicdting components of mar^geable complexity. 
55 The advantage of this appiwich is that the conopononts 
can be reused and that the system is easier to adapt 
and maintain. 

The disadvantage Is the overhead because of the 
inter-component communication or because the compih 
30 er does not optimize over the component boundaries. 
Therefore, the inter-component communication sennan- 
tics should be such that modularity can be removed eas- 
ily when merging two components into a single compo- 
nent. 

35 In the past, a tot of effort has been put m design 
environments that allow to implement the components 
of a digital system. 

Languages with associated simulators, tuned to- 
wards specific applicatk^n domains, allowlo specify arxJ 

40 simulate components at a high abstraction level. Hard- 
ware compilers can implement the component descrip- 
tion into processors with highly specialized architec- 
tures. Software compilers allow to generate machine 
code Tor ofUhe-shelt programmable processors. In- 

4$ structksn set simulators allow to debug the machine 
code at ditfereni levels ot abstraction (C. asm). Exam- 
ples of such design environments are Cathedral-i/2/3. 
the AI^M processor tool suite (C-compiler and the AR- 
Mulator), and the Synopsys synthesis tools. From the 

50 atDO^e it can be concludedthatthe components of digital 
systems can be implemented with off-the-shelf design 
environments. What is missing is the glue that links 
these design environments together and automatically 
interfaces the generated or off-the-shetf processors ac- 

ss cord ing to the system specification . Hor>ce, a system de- 
sign environment should allow to include existing design 
environments easily. It should provide synthesis tools 
for f^rdware/hardwarearKJ hardware/software inter<ac- 
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ing thai are prc>ce£sor and design environmeni inde- 
pendent To achieve this» Ihe specificalion method must 
allow to nnodel oft-the-shelt components on an as-is ba- 
sis. 

In summary, the lollowing requirements can be de- 5 
fined for a hardware/sottware system design environ- 
ment 

• Moduldnty being essential to master complexity, but 
the overhead should be minimal and removable. 

• Different description languages are needed to allow 
each system component to be described with the 
most appropriate paradigm. 

• The design environment must be able to model the 
heterogeneous conceptual specification, the result- 
ing heterogeneous architecture and all refinement 
steps in between. 

• Ofl-ihe-shetf components and the associated de- 
sign environments need to be modeled. 

■ A clear separ^lion between lunctional and commu- 
nicBtrOn behavior is required to allow to reuse de- 
signs 

• Procos£o« independent interface synthesis is es- 
sential 

25 

Summory of the invention 

A design methodology and a design environmeni 
mectinQ above-Stated requirements for a hiaTdware/soit- 
ware system co*design environment is disclosed in the ^0 
present application, A hardware/software co-design en- 
vironment and design methodoiogy based on a data- 
model that allows to specify, simulate^ and synthesize 
heterogeneous hardware/software architectures from a 
heterogeneous specification is disclosed. Said environ- 
mem and said methodology are based on the principle 
of er>cap5ulation of existing hardware arxJ software 
compilers and allow for the interactive synthesis of hard- 
ware/software and hardware/hardware interfaces. 

It is a first object of the present invention to disclose 
a design environment for implementing a heterogene- 
ous essentially digital system. Said system comprises: 
a database compiled on a memory struaure 
adapted for access by executable programs on a com- 
puter for generating the implememaiion ol said helero- ^ 
geneous essentially digital system, comprising a plural- 
ity of objects representing aspects of said digital system 
wherein said objects comprise prrmitive objects repre- 
senting the specification of said digital system and hier- 
archical objects being created by said executable pro- 
grams while gen^r^ting the implementation of said dig- 
ital system, said hierarchical objects being refinements 
of said primitive objects and having more detail and pre- 
een/ing any one or all of said aspects to thereby gener- 
ate said implementation of said digital system ; and fur- 
ther comprising relatior^ inbetween said primitive ob- 
jects and inbetween said hierarchical objecis and be- 
tween said primitive objects and said hierarchical 
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objects ; and further comprising functions for manipulat- 
ing said objects and said relations. Said system further 
comprises means for specifying said heterogeneous 
digital system comprising a plurality of behavioral and 
structural languages; means for simulating said hetero- 
geneous digital system compfteing a plurality of simula- 
tors tor said behavioral and structural languages; means 
for implementing said heterogeneous digital system 
comprising a plurality Of corrtpilers for said behavioral 
and stmctural languages; means for allocating hard- 
ware components for an implementation of said heter- 
ogeneous digital system; means for assigning hardware 
subsystems and software subsystems Of said heteroge- 
neous digital system to said hardware components; 
means for briplemanting the communication between 
said software subsystems and said hardware subsys- 
temsi one of the aspects of said communication t>eing 
represented by pons: means for encapsulating said sim- 
ulators, said compilers, said hardware components, 
said hardware subsysteriis and said software subsys- 
tems whereby creating a cor»SiSt©nt communication be- 
tween said encapsulated simulators, compilers, hard- 
ware components, hardware sui>systems arid software 
subsystems; and means for creating processor models 
of said hardware components as objects rn said data- 
base, said models comprising software models repre- 
senting the software views on said hardware compo- 
nents and hardware models representing the hardware 
views on said hardware components. 

In an aspect of the present invention, the design en- 
vironment further comprises means tor creating I/O sce- 
nario models of said ports as objects in said database, 
said I/O scenario models representing the implementa- 
tion of said ports on said hardware components, said 
implementation comprising software subsystems, hard^ 
ware subsystems, and processor models with connec- 
lions therebetween. 

In another aspect of the present invention, the im- 
plementation of the communication between a first soft- 
ware subsystem arKi a first hardware subsystem in the 
design environnoent results in said first software subsys- 
tem with a first port being replaced by a second hard- 
ware subsystem with a second port, said first port and 
said second port representing an essentially identical 
communication. 

Yet in another aspect OT the present invention, fur- 
ther comprise mearrs for selecting I/O scenario models 
for the ports of said first software subsystern: means for 
combining the software subsystems Of said selected 1/ 
O scenarios; means for combining the hardware sub- 
systems of said selected ia!> scenarios. 

In the design environment, a first I/O scer>ario mod- 
el can represent the connection of said first port to said 
second port, said connection comprising a connection 
of said first port to said software subsystems of said 1/ 
O scenario model, connections of said software subsys- 
tems of said I/O scenario model to said software rriodel. 
connections of said hardware model to said hardware 
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subsystems ol said I/O scenario model, and a connec- 
tion Of said hardware subsysioms of said I/O scenario 
model to said second port. 

In the design ©nvironnwni. the I/O scenario models 
can further comprise memory mapped UO scenarios, in- 
struct'iOT programmed I/O scenarios, and interrupt 
based I/O scenarios. 

In an aspect of the present invention, said irnple- 
mentatlon is a simulation of said digital system. Said 
simulatton can be a muHi-abstraction level simulation, 
said mutti-absiractkxi level simulation comprising sub- 
stantially simultaneous tow-level and high-level simula- 
tion. 

Said simulatton can be a multi-platform simulation 
being executed on a plurality of computers. • 

Said simuation can be a hybrid simulation compris- 
ing substantially simultaneous hardware implemonta- 
tions and computer simulations. 

Said implementation can be a heterogeneous im- 
piemeniation comprising hardv/are subsystems and 
sottvwere subsystems, said software subsystems being 
executed on one or more ol said hardware subsystems. 

Said hardware subsystems can comprise any one 
or more of prxx;^sor cores, off-the-shelf components, 
custom components, ASICs, processors, or boards. 

Said software subsyslems can comprise machine 
instructions for said hardware subsystems. 

It a second object of the present invention to dis- 
close a method of making an implementation ol a het- 
erogeneous essentially digital system, comprising the 
steps of : 

defining a first set of primitive objects representing 
the specification of said digital system* comprising 
the steps of : 

describing the specification of said system in a plu- 
raUty of processes, each process representing a 
functional aspect of said system, said processes 
being primitive objects; 

defining ports and connecting said ports with chan- 
nels, said ports structuring the communicalion be- 
tween sak) processes, said ports and said channels 
being primitive objects, one process ha\dng one or 
more ports; 

defining the communication semantics of sa»d ports 
by a protocol, said protocol being a primitive object ; 
and thereafter 

creating hierarchical objects being refinements ol 
said primitive objects and having more detail* while 
preserving aspects of said communication seman- 
tics; allocating one or more hardware componsnts. 
said components comprising programmable proc- 
essors and non-programmable processors; assign- 
ing said processes to said hardware components, 
the processes being assigned to a programmable 
processor berig a software subsystem, the other 
processes being hardware subsystems; and seieci- 
ing I/O scenario models for the ports of said soft- 



ware subsystem thereby connecting said ports to 
the interface of said programmable processor and 
connecting the interface of said programmable 
processor to second ports, said second ports rep- 
s resenting an essentially identical communication as 
said ports. 

The method can further comprise the step of simu- 
lating said system. 
10 Said implementation comprises hardware and soft- 
ware subsystems ol said system, said softv\/are subsys- 
tems being executed on one or more of said hardware 
subsystems. 

TTie method can comprise the step of generating a 
^5 netlist comprising the layout information of said imple- 
mentation. 

Said hardware subsystems can comprise any one 
or more Of processor cores, off-the-shelf components, 
custom components, ASlCs, processors, or boanis. 

20 In an aspect of the present irrvention, the method 
can further comprise the step of refining the channel in- 
between a first and a second port Of respectively a first 
and a second hardware component, said first and said 
second port having an incompatible protocol, thereby 

zB creating a hierarehicat channel, said hierarchical chan- 
nel converting the first protocol into the second protocol. 

The method can comprise the step of refining the 
channels inbetween incompatible ports of hardware 
components, thereby creating hierarchical channels. 

so and the step of generating a netlist comprising the layout 
information of said implementation. 

Brief description of The drawings 

55 FIGURE 1 is a schematic representation of a het- 
erogeneous digital system comprising various specifi- 
cation paradigms. 

FIGURE 2 is a flowchart representing the rr^thod 
for hardware/software co-design of the present inven- 

40 lion. 

FIGURE 3 is an illustration of the primtth/e objects 
in the database and the relations in between said prim- 
itive objects. 

FIGURE 4 is an iilusu^tion of the hierarchical ob- 
45 jecis In the database and the relations in between said 
hierarchical Objects and between the primitive and hier- 
archical objects. 

FIGURE 5 is an illustration of the process merge 
transformation. 
50 FIGURE 6 is a flowchart ol a specific embodiment 
of the implsrt^ntation process for hardwaro/softwar© 
co-design. 

FIGURE 7 is a echenr^atic representation Of th© 
lunct'tonalhy of the hardware/software interface ganera- 
tion. 

FIGURE 8 is a schematic representation of a par- 
ticular I/O scer^'io modeled in the database. 

FIGURE g is a flowchart of a specific embodiment 
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Of the construction of a hartiwafe/softwarG co^simute- 
lion. 

FIGURE 10 is a block diagram of a typical heterch 
gonecus digital system: the pager appllcaiiori. 

FIGURE n is a schematic representation of the 
pager application as described with the present inven- 
tion. 

FIGURE 12 is a block diagram of the pager after 
application of the process merge transformation. 

FIGURE 1 3 is a block diagramed the pager after the 
communtcatkjn channels have been tagged with a spe- 
cific communication behavior, 

FIGURE 14 is an illustration of the introduction oJ 
specific communication behavior in the pager applica- 
tion by refining a primitive channel tnio a hierarchical 
channel. 

FIGURE 15 is an illustration of ths implementation 
of a process in hardware, whereby the resulting hard- 
ware subsystems are encapsulated to make them com- 
mtintcate. 

FIGURE 16 shows the details of the encapsulation 
of the hardware subsystems in this particutef applica- 
tion. 

FIGUFlE 17 is an illustration of the generation of a 
hardware/software interface between a software sub- 
system compiled on an ARM procassorcore and a hard- 
ware subsystem. 

Detailed description of the invention 

In the sequel a design environment and a design 
methodology meeting the requirements of modularity, 
encapsulation of diftereni description languages, nrjod* 
el in 9 from a heterogeneous cor>ceptual specrfication to 
a resulting heterogeneous architecture and all refine- 
trient steps inbetween, modeling capabilities for off-the- 
shelf components and the associated design environ- 
ments, separation between functional and communica- 
tion behavior and processor independent interface syn- 
thesis is disclosed. Said design environment in the se- 
quel is called CoWare. 

In the sequel it is to be understood that the concept 
refinement means corweriing or transiting Of trans- 
forming a specification Ot an electronic system into an 
implementation. Said implementation can be an archi- 
tecture of components that has the same behavior as 
said specification or that executes said specification. 
Said implementation can also be an added development 
in the chain leading to a final implementation. Adding 
detail means that an implementation is made more spe- 
cific or concrelo as a result of an imptemcntation deci- 
sion on a previous level in the chain leading to a final 
implementation. To detail can also mean adding a ma- 
terial object such as a specific component or a specific 
communication inbetween components, as opposed to 
an abstract aspect on a previous level in the chain load- 
ing to a final implemenlalioo. Other instances of the con- 
cept refinement and of the concept detail arelo be found 



in the sequel 

FIGURE 2 shows the architecture of the CoWare 

system- 

The CoWare system supports four major design ac- 
5 tivities: co-specificaiion 8, co-simulation 9, co-synthesis 
10 and interface synthesis 11 . The input is a heteroge- 
neous specification of an electronb system, the output 
12 is a netlisi for prior-art commercial tools tor the gen- 
eration o1 the implementation layout. Said output pref- 
10 erabty comprises structural VHDL or Verilog and ma- 
chine code tor the programmable processors. 

The CoWare design environment is implemented 
on top of a data model in which modularity is provided 
by means of processes. Processes contain host lan- 
75 guage encapsulatrons which are used to describe the 
system components. Communication between process- 
es takes place through a behavioral interface compris- 
ing ports. For two processes to be able to communicate, 
their pons must be connected with a channel. The inter- 
^ process communicalion semantics is based on ihe con- 
cept of the Remote Procedure Call (RPC). The data 
model is hierarchically stmctured and allows to refme 
channels, ports, and protocols into lower level object5» 
adding detail- We refer to the most abstract object as a 
primitive object An object that contains more implemen- 
tation detail, is referred to as a hierarchical object. 

we first discuss the primitive objects. The hierarchi- 
cal objects are used to refine the communication behav- 
ior of the system and are discussed attenwards. 
30 A process is a container for a number of host lan- 
guage encapsulations of a component. A single process 
can have multiple host language encapsulations de- 
scribing different implementations for the ssme compo- 
nent, or for the ^me component represented at drffcr- 
3S ent abstraction levels. 

A host language encapsulation describes a compo- 
nent in a specific host language. Preferably C, C-h-. 
DFL VHDL and Verilog are supported host languages. 
A CoWare language encapsulation is used to describe 
40 the system's structure. In a CoWare language encapsu- 
lation, one can instantiate processes and connect their 
ports vvith channels. 

Other host language erx;ap5Ulatlons comprise con- 
text and a number of threads. The context and thread 
4$ contain code written in tne host language of t j eru:;ap- 
sulation. The conteict contains code that is common to 
all threads in the encapsulation, i.e. variables/signals 
and functions as allowed by the semantics of the host 
language. As such the context provkles for inter-thread 
so (intra-process) contmunication. 

Each primitive CoWare process (symbolised by an 
ellipse 13 in FIGURE 2) encapsulates concurrent pro- 
gram threads in a host language ot choice. Coru:urrent 
threads communicate over shared memory inside a 
ss process. Inter-process communk:ation Is over uniKlirec- 
tjonal channels using a Remote Procedural Call (RPC) 
protocol. Tha reasor^ of tt^is choice will be explained 
below. 
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Nolics that in this way hetefogcneous specrfication 
is supponedr both hardware and sott\^are aspects, 
structural and behavioral aspects, and difterent specifi- 
cation paradigms (data-flow. contro!-flow» ...) can be 
combined. 

Co-specification alfows to describe a functional 
specification based on the coricept o1 commiinicating 
CoWare processes. 

An Important concept of CoWare is that basically no 
distinction Is made between co-sbnulation and co*syn- 
thesis. Both are based on the concept of refining the 
specification for Implementation, re-using existing com- 
pilers, emulators, and simulation processes. 

In refinement for co-synthesis the designer per- 
forms an interactive coarse partitioning of the specifica- 
tions over a user allocated architecture. This leads to a 
merger of component compiler consistent processes to 
be napped on the same component. Component conv 
piler consistent processes have an encapsulation In the 
same host language. Merging consists of in-lining the 
RPC calls between said processes and leads to two 
&ubproblems; th^ mapping of the concurent threads in 
the processes on a processor re-using existing compo- 
nent compilers 14, 15, 16, 17, 18, 19andthe refinement 
of the communication between processes into hardware 
and software communication protocols that implements 
it. The implementation of concurrent threads and intra- 
process conrvnunication must be taken care ol by using 
Real-Time Operating Systems (RTOS), micro-kernels 
or software synthesis in case of programmable proces- 
sors or by providing a library based communication pro- 
tocol ©hell around the existing hardware synthesis tools. 
Refinement of the Inter-process communication means 
again a refinement of the primitive RPC communication 
by expanding the communication ports into implemen ta- 
ble protocols available in a protocol library 20, It is also 
possible to assign channel processes to ^jstraci chan- 
nels. 

In priTKiple all of this is open to the user who can 
add his own library for communication protocols. On the 
other hand CoWare provides in the SYMPHONY tool- 
box 21 a methodology for interface synthesis whereby 
every communication channel is refined by selection of 
a communication scenario, in this way autonated syn- 
thesis of hardware/hardware and hardware/software ir>- 
terfaces, including the generation of the software drivers 
in programmable processors is possible. This is an es- 
sential part of hardware/software co-design. 

After the compilation of all components, all hard- 
ware is available as structural VHDL and all software tor 
The processors is in C which c^n be compiled on the 
host compiler of the programmable components. The fi- 
r^al step is to link ait the synthesis and hardware descHp- 
lions to drrv© commercial back-end tools to generate 
layout 

In FIGURE 3, the processes system 22 and subsys- 
tem 23 contain a CoWare language encapsulation. The 
CoWare lenguago encapsulation of system 22 de- 



scribes how it is built up from an instance of subsystem 
23 and an instance of P4 (24). The processes Pi <2S), 
P23 (26), and P4 (24) each contain a C language en- 
capsulation. 

5 Ports are objects through which processes commu- 
nicate . A primitive port is characterized by a protocol and 
a data type parameter, 

There is one irr^licit port, the construct port, to 
which an RPC is performed exactly once at system start- 

-io up. 

In FIGURE 3 the process P23 (26) has two primitive 
ports p2 (27) and p3 (2B), next to the implicit constnict 
port 

Protocols define the communication semantics ol a 

15 port. A primitive protocol is one of master, inmaster, out- 
nnaster. inoutmaster, slave, inslave, outslave. inout- 
slave. Each primitive protocol indicates another way of 
data transport. The in, out, and inout prefix indicates the 
direction of the data. The master, slave postfix indicates 

20 the direction of the control: whether the protocol acti- 
vates an RPC (master) or servfces an RPC (slave), in 
the remainder of this text, ports with a slave/master pro- 
tocol are usually referred to as slave/master ports. 
In FIGURE 9 master pons (29, 30) are represented 

2S by the small shaded rectangles on a process' perimeter. 
Slave ports (27, 28) are represented by email open roc- 
tangles on the perimeter. The data direction of a port is 
represented by the arrow that connects to a port. In FIG- 
URE 3 port pi (29) is an outmaster port and port p2 (27) 

30 Is an inslave port. 

A protocol may further have an index set The indi- 
ces in the index set are used to convey extra information 
about the data that is transported. For example the prim- 
itive protocol used to model the memory port ot a proc- 

35 eseor will have an index to model the address of the data 
that (S put on the memory port. 

A thread is a single flow of control within a process. 
A thread contains code in the host language of the en- 
capsulation ol which the thread is a part The code in a 

40 thread is executed according to the semantics of the 
host language. We distinguish between slave threads 
and autonorrkous threads. 

Slave threads are uniquely associated to slave 
ports and their code is executed when the slave port is 

45 activated (i.e. when an RPC is pertormed to the slave 
port). There Is one special slave thread which Is asso- 
ciated to the implicit consirucl port and can be used to 
initialize the process. 

In FIGURE 3 the process P23 (26) contains two r-eg- 

50 ular slave threads (31 . 32) associated to the slave ports 
p2 (27) ^nd p3 (26) respectively, next to the speciatcon- 
struct slave thread (33). 

Autonomous threads are not associated to any port 
and their code is executed, after system Initialization, in 

ss an infinite loop. 

In FIGURE 3 processes Pi (2S) and P4 (24) each 
contain an autonomous thread (34). 

A language encapBulalion can contain multiple 
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slave and auionoinous threads that, in principle, ali ex- 
ecute concurrently. 

A channel is a poini-lo-pomt connection ot a master 
port and a slave port. Two ports that are connected by 
a channel can exchange data. Channels can he uni- or 
bi-directional. A primitive channel provides lor unbuff- 
ered communication. It has no behavior, it is a medium 
lor data transport, in hard>fvare rt is implemented with 
wires. In software it is implemented with a (possibly In- 
lined) function calL In this way, primitive channels model 
the basic communication primitives found back in soft- 
ware and hardware description languages. 

In the strict sense only point-to-point channels con- 
necting one master to one slave port are allowed. Howi- 
ever, a person skilled in the art, can easily remove this 
restriction to allow channels connecting two master 
ports or two slave pons, or to allow channels connecting 
multiple Slave and master ports. 

Such an extended description, can be translormed 
into ihe basic model, by using a default or user-denned 
translation scheme. 

In RGURE 3, there is a primitive channel (35) that 
connects port pi (29) ot process PI (25) with port p2 
(27) of process P23 (26). 

Communication always happens between two 
threads. Communication between threads thai aro part 
of the same process is denoted as intr^-process conv 
municatton. Communication between threads in differ- 
ent processes Is denoted as inter-procoss communica- 
tion. 

Inua-process (inter-thread) communication is done 
by making use of shared variables/signals that are de- 
clared in the context of the process. Avoiding that two 
threads access the same variable at the same time is 
host language dependent. It is the user's respOf»sibil(ty 
to protect critical sections using the mechanisms pro- 
vided in the host language. 

In FIGURE 3. intra-process communication occurs 
in process P23 (26). 

The variable tmp (36) declared in the context (37) 
is shared by slave thread p2 (31) and slave thread p3 

(32). . . 

Inter-process (inter-thread) communication with a 
primitive protocol is RFC based. On a master port, the 
RPCtunciion can be used to initiate a thread in a remote 
process. A rr^aster port can be accessed Uom anywhere 
in the host language encapsulation (coniexi, autono- 
mous threads, slave threads) wilh the exception of the 
construct thread 

The RPC function returns when the slave thread 
has completed, i.e. when alt the statements in the sieve 
thread's code are executed. In the slave thread (unique- 
ly associated with a slave port), the Road and Write 
functions can be used to access the data of the slave 
port 

The Index function is used to access the indices of 
the protocol of the port The RWbar function is used on 
en inoulslawe port to determine the actual direction of 



the data transport. A slave port can only be accessed 
from within its assocfeled slave thread. 

A bi-directional port can be used to both send and 
receive data. 

s However, according to the strict RPC semantics this 
cannot be done by the same RPC call. In a single RPC 
call, one uses the bi-directional port either in the input 
or in the output direction but not in t>oth directions. Por 
a person skilled in the art, it is easy to extend the strict 
10 RPC semantics to full fledged function call semantics 
where argumenisare passed to a remote procedure and 
results are received back. 

In FIGURE 3. inter-process communication occurs 
between processes Pi (25) and P23 (26) over the chan- 
75 nel (35). When the RPC statement {2S)tn the autono- 
mous thread (34) is reached, the value of the local var- 
eble data (39) is put on the channel (35). and the control 
Is transferred to the slave thread p2 (31). The autono- 
mous Thread (34) is haRea until the last statement of the 
so slave thread (31 ) is executed. After Ihai the autonomous 
thread (34) resumes by executing the statement (40) af- 
ter the RPC statement (3B). 

By using primitive channels, ports, and protocols, 
the designerfirstconcentratesonthefunctionalityof the 
25 system while abstraotsng from terminals, signals, and 
handshakes. Once the designer "is convinced that the 
processes of the system are functionally correct, the 
communication behavior of the system can be refined. 
Communication refinement in CoWare is carried out by 
50 making the objects involved in the communicaiion 
(channels, ports, and protocols) hierarchicaL 

Hieraichlcal channels are processes that assign a 
given communication behavtor to a primitive channel. 
The behavioral interface of a hierarchical channel is 
35 fixed by the ports connected by the prirriitive Channel 
Making a channel hierarchical, can drastically change 
the communication behavior of two cortnccted process- 
es. It can, for example, parallelize (pipeline) the proc- 
esses by adding buffers. The one property that is pre- 
40 served by making a channel hierarchical is the direction 
of the data tran^rt. 

In FIGURE 4, the prtmiiive channel (41) between 
processes Pi (42) and P23 (43) is refined into a hierar- 
chical Channel (44) with FIFO behavior. The FIFO hicr- 
^ archical channel (44) deoouplesthe autonorrwjus thread 
of process Pi (42) and the slave thread associated with 
port p2 (45) of process P2 (43). The effect is thai the 
rate at which process Pi (42)can Issue HPCs is no long- 
er determined by the rate at which process P23 (43) can 
so service the RPCs. The FIFO hierarchical channel (44) 
takes cera of the necessery buffering of data. 

Hierarchical ports are processes that assign a given 
cormiunication behavior to a primitive port. The behav- 
ioral intertace of the hierarchical port is partially fixed by 
ss the primitive port it refines. The hierarchical port process 
should have one port, whch we call the return port, that 
is compatible wrih the primitive port. MaWng a primitive 
port hicrarchcaK preserves the data direction (inybut). 
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Two ports are cOTnpatiblo if iheir primitive protocols are 
compatible, it they have equal data type, and if they have 
equal protocol indices. The foltowing primriive protocols 
are compatible : (master, slave) : (inslave, outmaster) ; 
(insJave, boutmaster) : (oirtslave, inmaster) ; (oatslave, 
inoutmaster) ; (inoulstave, inmaster) ; (Inoulslave. 
outmaster) ; (inoutslave, inoutmaster) . Two hierarchi- 
cal protocols are compatible if their primhive protocols 
are compatible and they have the same name. 

in FIGURE 4 we impose a certain data formatting 
for the data transported over the channel (4i) between 
process PI (42) and the FIFO hierarchical channel (44). 
This is achieved by making the primhtve ports pi (46) 
and left (47) hierarchical. The Tomnat process (48) that 
refines port pi (46) might for example add a cyclic re- 
dundancy Check to the data that is transported. The un- 
tormai process (49) that refines port left (47) of the Fl FO 
hierarchical channel (44) then uses this cyclic redun- 
dancy check to determine whether the received data is 
valid The actual data and the cyclic redundancy check 
are sent sequentially over the same primitive channel 
between ports op (50) and ip (51). As a consequence, 
the data rat© between the formal (48) and unformat (49) 
process is twice the one of process P^ (42). 

Primitive protocols provide a classification of all hi- 
erarchical protocols. Aprimrlrve protocol determines the 
communication semantics, but not the communkration 
implementation: it does not fix the timing diagram used 
In the communication. Hierarchical protocols refine 
primitive protocols with a timing diagram and the asso- 
ciated I/O terminals. Hierarchical protocols are high lev- 
el models for alternative impiementatkjns of a primitive 
protocol: they preserve both data direction and control 
direction of the primitive protocol 

To access the terminals of the hierarchk^al protocol, 
a hierarchical port is introduced at the same time. The 
lenninals can be accessed from within the thread code 
by using the functions Put, Sample^ and Wait. 

In RGURE 4, the primitive protocol of the port (50) 
and ip port (51) of the format (48) ai^ unformat (49) 
process are refined into an RS232 protocol (52). In the 
RS232 hierarchical port (53), ao RPC (54) issued in the 
formal process (48) on the op port (50) is converted into 
manipulations ot the terminals (55) according to a timing 
diagram (56). 

The CoWare nr^odel is implemented on a computer 
or on a plurality of compuiers and a set of .application 
programmer's interface (API) functions is available. 

When a CoWare system description is parsed, a 
representation of the system in memory is created in 
which the objects of the description are related to each 
other. All tools ol the CoWare environment use these 
API functions to analyze, manipulate, and refine the sys- 
tem description. 

Due to the seJection ol RPC as inter-process conv 
municatton. the classification of protocols and the struc- 
turing ol a process in encspsijlations with context and 
threads, a process merge transformation can be imple- 



mented. 

The goal of this transformation is to combine a 
number of process instances, that are described in the 
same host language, into a single host language encap- 
5 sulation that can then be mapped by a host language 
compiler onto a single processor. 

In the process of merging, all remote procedure 
calls are in-lined: each slave thread is in-lined in the 
code of the thread that calls it through an RPC stale- 
10 ment. Because of the semantics of RPC communica- 
tion, this transfwmation does not alter the behavbr of 
the original system, provided that care is taken to avoid 
name clashes. The result of merging is a single host lan- 
guage encapsulation thai contains a single context, a 
T£ single construct thread, one or more autonomous 
threads, possibly multiple slave threads to service RPC 
requests from external process (not involved in the proc- 
ess merge iransformairan). possibly multiple RPCs to 
Slave threads in exiemal processes (not involved in the 
so process merge transformation). 

FIGURE 5 shows the effect ol merging the process 
instances (25, 26) in the subsystem process (23) of FIG- 
URE 3. The subsystem process (57) has a CoWare lan- 
guage encapsulation. After merging the instances (53, 
25 59)^ WG obtain a C language encapsulation (60) which 
is added to the subsystem process. 

The benefit of merging processes is that the in-lln- 
ing transfomnation eliminates the overhead that accom- 
panies execution of (rerr^jte) pnxedure calls. It further 
30 reduces the number of concurrent threads arwl. there- 
lore, the overhead that accompanies the switching be- 
tween threads. Finally, it allows the host language corrv- 
pilers to optimize over the boundaries of the original 
processes. 

35 The port and protocol hierarchy provides a clear 
separation between functional and communication be- 
havbr. Traditionally, the description of a component 
contains both functional and communication behavior in 
an interleaved way. When such a connponent has to be 

40 re-used, in an environment other ttian it was intended 
lor, the designer has to change those parts ol the de- 
scription of the component that have to do with commu- 
nication. In CoWare, a component's behavior is de- 
scribed by a process that makes use ol RPC to commu- 

45 nicale with the outskie wortd. Such processes can be 
connected with each other without modifying their de- 
scription (modularity). By using primitive pons and prim- 
itive protocols, the designer concentrates on the lunc- 
tionalfty of the system whUe abstract! r»g from terminals, 
sigr^ls, and handshakes. 

Later, when the component is instantiated in a sys- 
tem, the primitive protocol is rcftned into the best suited 
hierarchical protocol, taking into account the other sys- 
tem components involved. This fixes the timing diagram 

SS and terminate used to corrvmunicate over that port. The 
port containing the hierarchical protocol, is made hier- 
archical to add the required communication behavior 
that implements the timing diagram of the selected hi- 
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erarchtcal proioco!. Again this is achieved wittioui moct- 
ilying the description of any of the processes involved. 

Because o1 this property It Is feasible to conslruct 
libranea of functional building blocks and libraries of 
communication blocks that are re-usabis: they can be 
plugged together without modifying their deSCflpiion. Af- 
ter blocks have been plugged together, any communi- 
cation overhead (chains of remote procedure calls) can 
be removed by in-lining the slave threads that sen/e the 
RPCs. The result is a description of the component in 
which function and communication are interleaved 
seamlessly and which can be compiled into software or 
hardware as efficiently as a description in the tradittonal 
design process. 

The above method reduces the amount of protocol 
conversions needed at the system level and allows to 
postpone the seleciion of the communication protocol 
and Its implementation until late hi the design process, 
in this way achieving the requirements of 'design for re- 
use". The concept ol hierarchical protocols is also useful 
to model off-the-shelf components ("re-use of de&igns"), 
because the liming diagrams according to which a proc- 
essor corrvnunicaies are abstracted vi it. 

The input to the implementation refinement process 
is a functional specification: a CoWare language encap- 
mutation consisting of a number of process instances (i, 
G. host language encapsulations), exhibiting both tnt re- 
process and intcr-procesfi communication behaviof. In 
a first step, allocation is perfonricd. In this step the 
number and type of processors are selected that will 
serve as the target for implementing the input speclHca- 
tion. After allocating the necessary processor resourc- 
es, an assignment step is performed. In this step each 
process instance of the input specification is assigned 
to one of the allocated processors. 

The rest o! the implementation path is illustrated in 
FIGURE 6. 

All process instances bound to a single processor 
are merged. This results in a system with a one-lo<ine 
mapping between (merged) processes and allocated 
processors. In FIGURE 6, all DFL processes (61) are 
merged (62) into a single DFL process (63). 

The host language encapsulation from each 
(merged) process Instance now has to be compiled onto 
its processor target using a host language compiler This 
comprises the foflowing steps: 

(1 ) The CoWare concepts of autonomous thread, 
slave thread, and shared context are implemented 
on the processor target 

(2) The inter-process communication is implement- 
ed. 

(3) The resufting processors arc encapsulated so 
triat they can be connected with the rest of the sys- 
terrv 

In step (1) existing (commeiclal) host language 
compilers are rs-ufied. When such a host language 



compiler does not directly support the CoWare concepts 
of airtonomoufi ihreadi Slave thread, and shared con- 
text, the CoWare environment supports two altema- 
tivesi 

£ 

- the host language compiler is extended vwth librar- 
ies that support such concepts (multi-thread li- 
brary): 

- software synthesis is perfomr^ed to translate the 
70 host language encapsulation to a description that 

can be handled by the host language compiler. 

In FIGURE 6, the DFL process (63) is compiled with 
the Cathedral compiler (64). The result is a VHDL net 
15 list (65) of the implementation. 

In step (2) when the process is compiled on a non- 
programrr^le processor, the implementation of inter- 
process communication comprises the steps of refining 
the primitive ports/proiocols into appropriate hierarchi- 
20 cal ports/protocols and merging the hierarchical ports 
with the process. In FlQUI=tE 6, hierarchrcai pons (66) 
are added to the 

VHDL piocssscs (67). After merging (68) all proc- 
esses (S7). The resulting VHDL process (69) is compiled 
2S with the Synopsys compiler (70). 

In step (2) when the pfocess is compiled on a pro- 
grammable processor, the implementation of inter-proc- 
ess communication comprises the steps Of generating 
device drivers and of generating hardware intcrtaces. A 
30 software tool is used to achieve this. In the sequel this 
tool is called SYMPHONY. FIGURE 7 illustrates the tool 
SYMPHONY. 

SYMPHONY makes use of a software model (71) 
and hardware model (72) of the target processor and a 
^5 library of I/O scenarios (73) for that processof. The hard- 
ware model (72) of the programmable processor core 
consists of a HDL host-language encapsulation that for- 
maliies the mfonnalion that is available in the hardware 
section of the data sheet of the prograrrvnable proces- 
40 sor core. The HDL host-4anguage encapsulation of the 
hardware nwlei is characterized by a behavioral inter- 
face that is conform with the hardware interiace of the 
programmable processor. All ports (74) trave hierarchi- 
cal protocols: they consist ot terminals and a timing di- 
45 agram. The hardware model may also contain a HDL 
description for either a blacK box, a simulation model, 
or the full description ot the processor core. 

The eoflware rnodel (71 ) ol the pfogrammable proc- 
essor core consists of a software host-tar^guage encap- 
so euiatton that formalizes the tnf ormatton that is available 
in the software section of the data sheet of the program- 
mable processor. The software host-language encap- 
sulation of the software nnodel is characterized by a be- 
havioral interface that is conform with the sottwar© in- 
ss terface of the programmable processor core. All ports 
(75) have primitive protocols. The software model iden- 
tifies, for example, what ports can be used to get data 
in or out of the processor core (memory mapped^ co- 
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pTCCGSsor port, ...). what ports can be used as interrupt 
ports and what their charactaristics are (aiterrupt prior- 
ity, maskable interrupt -.). Ir) addition the software mod- 
el contains a behavioral description that allows lo com- 
pile a software host language encapsulation into ma- 
chine code. For example: lunctions lo nwage proces- 
sor specific actions such as installing an intenxipt vector, 
enabling/disabling interrupts, etc. In Figure 7, the sotl- 
warc modal of the ARM-S RISC processor is shown, A 
number of its ports (75) are shown in Figure 7. The menn- 
ory port mem is modeled as a (bi-directional) slave port. 
This slave port is accessed by the device drivers, by 
means of an pPC,lownte/read data to/from the extemal 
hardware. The slave thread, modeled in the software 
model, attached lo the mem slave port translates the 
incoming RPC to a memoiy access. SYMPHONY 
makes the connection between the device drivers and 
the mem port. The Tlq port is modeled as a master port. 
The software model of the ARM processor ensures thai 
art RPC to ihis port is periormed. eveiy time the proc- 
essor detects thai an inleraipt has occuned. SYMPHO- 
NY connects the fiq port to a slave thread that serves 
9$ the interaipl service routine, so that routine is started 
automatically. 

in figure 7, the hardware model of the ARm-$ RISC 
processor is shown. A number of its ports (74) are 
shown in Figure 7. The memory port mem is modeled 
as a (bi-directronal) masterport. The hardware model of 
the ARM- performs an RPC to port every time that it 
wants to wrileyread data from the RAM. ROM or memory 
mapped hardware. SYMPHONY connects the mem port 
to a slave thread in the hardware interface, which does 
the address decoding and fonvards the RPC to the ap- 
pnDpriate hardware block (RAM, ROM. memory mapped 
hardware). 

The fiq port is modeled as a slave port in tiie hard- 
ware model. This slave port is activated by an RPC that 
Is performed to the port by the hardware interface. The 
slave thread attached to the fiq port (and modelled in 
the hardware model) sets the appropriate flag in the sta- 
tus registerto the appropriate value, signalling the inter- 
rupt request. 

The link between events (RPC to ports, starting of 
slave threads attached to slave ports) in the hardvrare 
model arid events in the sortwara model, is taking care 
of Dy the processor hardware or, in case of simulation, 
by the Instruction-sel simulator for tne AFIM. 

SYMPHONY is based on the observalion that pro- 
grammable processors have a number of common conrv 
munication methods to get data in or out of the proces- 
sors. Th^3a communication methods are modeled by 1/ 
O scenarios. An I/O scenario describes one vvay of using 
the ports of a specific processor core to map a particular 
port of a software host language encapsulation to an 
equivalent port in hardware, thereby crossing the proc- 
essor core boundary while maintaining the communica- 
tion semantics. RGURE 6 shows an example of an I/O 
scenano. It consists of a sc^tware host-language encap- 



sulation and a hardware host-language encapsulation 
that describe a software I/O driver and the haridware 
counterpart, respectively. An I/O scenario is also tagged 
with some pertonnance figures that will allow the de- 

s signer or SYMPHONY to make a decision about what 1/ 
O scertario to use tor which port. 

The I/O scenario (78) of FIGURE 8 shows how an 
outmaster port psw (79) in software can be mapped to 
an outmaster port phw (80) in hardware, thereby using 

10 the memory port (81 ) of an ARM-S RISC processor core. 
The sothware process encapsulation P1 (62) represents 
the software i/O driver and copies data from port psw 
(79) to a specific memory address 0x08000 via an BPC 
call (64) . The hardware process enc^sulation P2 (83) 

IS represents the hardware counterpart and checks wheth' 
er the memory address bus the ARM (modeled by the 
protocol indices of the memory port (81)) equals 
0X08000. It this is the case, data that is residing on the 
memory data bus of the ARM is copied to port phw (80) 

so via an RPC call (BS). 

The library of I/O scenarios comprises: 

Memory-Mapped I/O scenarios. These provide a 
data-transfer mechanism is convenient be- 

25 cause it docs not require the use of special proces- 
sor instructions, and can implenr>ent practically as 
many input or output ports as desired. In rnemory- 
mapped I/O. portions of the address space are as- 
signed to input and output ports. Reads and writes 

30 to those addresses are interpreted as commands 
lo the I/O ports. "Sending* to a memory-mapped lo- 
cation involves etieciiveiy executing a 'Store* in- 
struction on a pseudo-mennory location connected 
to an output port, and 'Receiving' trom a memory- 

35 mapped location involves effectively executing a 
"Load' instmclion on a pseudo-memory kjcatbn 
connected to an input port. When these memory op- 
erations are executed on the portions of address 
space assigned to memory-mapped I/O. the mem- 

40 oty system ignores the operation. The I/O unit, how- 
ever, sees the operation and performs the corre- 
sponding operation to the connected I/O pons. The 
number of merr^ory locations assigned for memory- 
mapped I/O vyriil depend on the number of ports that 

45 a software processor component has to ' lysicaily' 
impiemGnt SYMPHONY proposes an assignment 
of address locations lo channels that will result in 
simple address decoding logic. However, the user 
can always override the proposed assignment. 

so • lnstruciior>-Programmed I/O scenarios. Some proc- 
essors also provide special instructions for access- 
ing special I/O ports provided with the processor "it- 
self. Using this scheme, these special communica- 
tion ports of the processor are connected to the ©X- 

ss ternal channels via the I/O unit. In addition to pro- 
viding hardware support for mertiory-mapped ar»d 
ir^struction-programmed I/O. the I/O unit also pro- 
vides support for hardv^re interrupt control. Inter- 
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rupts are used for different purposes, including the 
coordination ot interrupt-drrven I/O transfers. Differ- 
ent procesBOfS prosride different degree of hardware 
interrupt eupport Someprocessors provide direct 
access to a number of dedicated intenupt signals. 
Our I/O unit architecture makee use ot these signals 
when available. If more interrupt 'channels" are re- 
quired, as for example requiredlo support a number 
of interrupt-dnven communication channels, we 
use the strategy of interrupt vectors, interrupt vec- 
tors are pointersor addresses that tell the processor 
core where to jump to for the interrupt service rou- 
tine. In effect, this is a kind of memofy-mapped in- 
terrupt handling. 

Orw;e an I/O scenario is selected tor every port ot 
th© software host-language encapsulation: SYM- 
PHONY generates the necessary communicaiion 
software and the corresponding hardware I/O unit 
by combining the selected \/0 scenarios. The gen- 
erated communicaiion software, the software mod- 
el of th© processor core and the software host-lan« 
guage encapsulation itself ere merged and com- 
pited with the processor specific C-compiler. 
The result 01 SYMPHONY is a refinen^nt of the 
original host language encapsulation into a CoWare 
encapsulation of which the behavioral interface is 
identicdl to that of the original encapsulation. 

in addition SYMPHONY, adds RAM and ROM 
blocks to Store the program code and data (In figure 7. 
the RAM and ROM are not shown explicitly : they are 
part of the HW interlace). The result of SYMPHONY is 
a refinement d the original host language encapsulation 
into a CoWare errcapsutation of which the behavioral in- 
terface is identical to that of the original encapsulation. 
SYMPHONY effectively rsplaces a software encapsula- 
tion by a hardware encapsutetion that has equivalent 
functionality. 

In step (3) when two processors are not protocol 
compatible, a protocol conversion process is inserted. 
In FIGURE 6, the processor (65) compiled with Calhe- 
dral-2/3 and the off-the-shelf processor (76) have in- 
compatft>le protocols. Protocol conversion (77) is re- 
quired to make them compatible, 

A digital system in the CoW&re design environment 
can be simulated. Simu^tion is an implementation ot the 
digital system on one or more general-purposa comput- 
ers. The implementation process outlined above can be 
followed to construct a simulation. FIGURE 9 illustrates 
the construction of a simulation. For simulation the tar- 
get processors ere simulators (86) running on process- 
es (07) of the operating systems (B8) that run on the 
general-purpose computers. Aflocdtiori and assignment 
determine the simulation architecture. Arbitrary simula- 
tion architectures are supported by the CoWare design 
environment. Support is provided to select an optimal 
architecture for a given simulation speed and debugging 
visibil'rty. 



The host language compilers mentioned in step (i ) 
are now the simulation Gr>ginesforthe host languages. 

In step (2)» the inter-process communication rxjw 
consists of two parts. In a first part the oommunicatton 
5 is realized from the simulation engine to the OS process 
on which the simulation engine is running. In a second 
part the corrYmunicalion is realized i)etween two differ- 
ent OS processes over the OS and networtc layer. The 
communication between the simulation eng'me and the 
10 OS process is performed via the application program- 
mers interface ot the simulation engine. The communi- 
cation between two different OS processes is done 
through the OS inter-process communication primitives 
(e.g. shared memory and semaphores for two process- 
75 9s on a single OS. or TCP/I P sockets for two processes 
on distinct computers). 

When the simulaiion engine used has a fixed inter- 
face as for example an instruction set simulator for a 
programmable processor, then the hardware software 
so interface is generated with SYMPHONY and can be 
simulated as any Other process. 

The CoWare design environment supports multi ab- 
straction level simufeition which is the key for efTtcient 
co-simulation- It allows to simulate the processes under 
2S debug at an appropriate low level of abstraction for de^ 
bugging purposes, while simulating the other processes 
in the system at the highest appropriate abstraction level 
lor maximal speed. The time-consuming low abstraction 
level simulatk>n is limits to the smaliest possible part 
30 of the system under simulation, while still being able to 
eimulate these parts in the system context 

Because both simulation and implementation follow 
the same design process, it is possible to construct hy- 
brid simulation architectures in which part of the system 
35 is implemented by simulators running on OS processes 
and part of the sysiem is implemented by actual hard- 
ware. This is just one more manifestation of the hetero- 
geneity of digital system architectures. 

In a specific eml^odiment of the present Invention 
40 an application is disclosed in the sequel for hardware/ 
software ccKSesign of a pager application. 

SPECIFICATION OF THE PAGER 

45 Each block (89) in FIGURE lO corresponds lO a 
process implementing a specific function of the pager. 
This functional decompostlk^n determines the initial par- 
titraning. The arrows (gO)in between the processes rep- 
resent primitive channels with 1=1PC semantics. 

50 FIG URE 1 1 shows the RPC conrununication in detail 
for part ot the pager design. Th© blocks (91 , 92, 93, 94, 
95. 96, 97: 96, 99, 100. 101) in FIGURE 11 correspond 
to the processes (89) from FIGURE 10. 

The Sample Clock Generator process (94) contains 

55 an autonomous thread (1 02). This thread runs continu- 
ously. It performs an RPC (103) over its input port ip 
(1 04) to Tracking & Acquisition process (93) to 
tain a navy value lor deha. The autonorrK>us thread-(1 02) 
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of the process (94) acicl$ the delta parameter to some 
mtemai variable until a threshoid is exceedad. In this 
way it implements a sawtootti function. When the saw- 
tooth exceeds the (fixed) threshold an RPC call (lOSjis 
Issued to the A/D converter process (95). The autono- 
mous thread (102) of the Sample Clock Generator (^) 
performs an RPC (1 05) (gives a sample clock tick) every 
threshoW/delta iterations (real ckx:k cycles). 

The slave thread clock (106) in the A/D converter 
process (95) samples the analogue input, and sends the 
result to the Down -conversion process (i 00) via an RPC 
call (1 07), This in turn will activate the Decinnation proc- 
ess (99) via an RPC call, etc. 

The Conelatof Noise Estintator process (93) con- 
tains a slave thread (1 08) associated with port ip (109) 
to compote the correlatkxi values. This slave thread 
(108) (5 activated when the Phase correction process 
(97) wiles data to the Correlator Noise Estimator proc- 
ess (98) (I e. when the Phase Correction process (97) 
perlorms an RPC (11 0) to the ip (109) port of the Cor- 
relaioi Noi*c estimator process (98)). The slave thread 
(108) reads m the data and then performs an RPC (in) 
lo the User Interface process (91 ) to obtain a new value 
for the puamolcr par it requires for computing the cor- 
relation wiUics. Finally- tho new corr€Hation results are 
sent to the Tracking Acquisition process (93) via an RPC 
call (112) on Us op pon (113). 

The slave thread (114) in the Tracking Acquisition 
process t93) updates the delta value for the sawtooth 
function implemented by the Sample Ctock Generator 
process (94). It puts the updated value in the context 
(115). where it is retrieved by the slave thread op (116) 
which serves RPC requests from the Sample Clock 
Generator process (94). In this way the Tracking Acqui- 
sition process (93) influences the frequency of the clock 
generated by the Sample Clock Generator process (94). 
This example shows how the context (115) is used for 
communication between threads inside the same proc- 
ess whereas the RPC mechanism is used for commu- 
nk:ation between threads in drtierent processes. Ttie 
locking (117) and unlocking (118) of the context (115) is 
required to avoid concurrent accesses to the variable 
delta. The kx;k (117) in the slave thread op (n6) locks 
the context (ii5) for read: other threads are stil) allowed 
to read from the context (n5). but no other thread may 
write the context (ii 5). The lock (ii 9) inthe slave thread 
ip ( 1 14) kx;ks The context (n 5) for write: no other th read 
is allowed 10 write or read the context (115) until il is 
unlocked again. 

Each process is described in the language that is 
best frt for the characieristkis ol the function it imple- 
ments The data-fkjw blocks (NCO (101), Down^nver* 
sion (100)» Decimation (99), Chip Matchod Filter (96). 
Phase Correction (97^, Correlator Noise Estimator (99). 
and Sample Clock Generator (94)) are described in 
DFL The control oriented blocks (Tracking Acquisition 
(93). Frame Extraction (92) and User Interface (9i )) are 
dascribed in C. The code in FlGLIRE 11. is pseudo-code 



meant lor Illustration and does not correspond to the ac- 
tual code. 

DESIGN PROCESS 

& 

Alter the initial specificatim of the system has been 
validated by simulation, the designer starts the refine- 
ment process. 

At this moment it is not yet dedded what process 

70 will be implemented on v^at kind of target processor 
nor is it defined how the RPC communication will be re- 
fined. However, the choice of the specification language 
for each process restricts the choice of the component 
compiler and in that sense partly determines the target 

75 processor. Hence, studying possible altemativ© assign- 
ments of a process to a target processor may require 
the availability of a description ol the process in more 
than one specification language or a clear guess of the 
best solution. 

20 

ALLOCATION AND ASSIGNMENT 

This slop determines what processes will be imple- 
mented on what target processor. Th© initial specifica- 
^5 lion shows th© finest grain partitioning; a process in the 
Initial specification will never be split over several proc- 
essors. However, It may be worthwhile to combine a 
number of processes inside a single processor. This is 
achieved by merging those processes into a single proc- 

30 ess that can then be mapped on the selected target 
processor by a host language compiler Merging of proc- 
esses is only allowed when the processes are described 
' in the same specification language. Hence, studying 
possible alternative mergers may require that for a 

35 number of processes (e.g. Correlator Noise Estimator 
process) a description "is available in more than one 
specification language. After allocation arxl assignment 
one obtains a description with a one-to-one mapping of 
merged processes to processors. 

40 In the pager example (FIGURE 1 2) the following al- 
location, merging and assignment takes place. 

The NCO (120), Down-conversion (121), and Dec- 
imation (122) processes are merged and mapped in 
hardware onto an application specific DSP processor 

*5 (1 23) because the sample rate of the merged processes 
is kjentical which implies that they can be-clocked at the 
same frequency. The advantage is that only one <:lock 
tree needs lo be generated per merged process (i.s.o. 
one per original process). An additional advantage is 

so mat the scan-chains for the processes that are merged 
can be combined. 

The Chip Matched Filter (124), and Phase Correc- 
tion (1 25) processes are merged and mapped onto a 
CATHEDRAL'3 processor (126) because their sample 

ss rates are identical. 

The Correlator Noise Estimator process (127) is 
mapped ontoa CATHEDRAL-3 processor (1 28). It is not 
merged with the Phase Correctk>n process (125) be- 
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cause K operates at a four times lower If equency. 

The Sampte clock generator (129) is mapped onto 
a CATHEDRAL-3 processor (130). 

Tracking Acquisition (131), Frame Extraction (132), 
and User Interface (133) are merged and rr^pped on a 
programmable processor (134). For this design an 
ARM5 processor is chosen. 

The Hardware/Software tradeoffs are based on the 
following obsenrations. To Obtain a maximal degree of 
flexibility as much o( tne tunciionallty as possible is Im- 
plemented in software on the ARM6 (134). However, 
due to performance constraints ot the ARMS processor 
(1 34), there is a limit to what can be implementecJ in soft- 
ware. The two main factors that play a role in this prob- 
lem are The Tracking Acquisition process (131) has to 
be implemented rt> software because the algorithm used 
to perform tracking and acquisition may be modtlieO de- 
pending on the application domain oT the pager system. 

Ihe correlator Noise Estimator process (127) is not 
included in soltvwarc because the inpui rate (or the Cor- 
relator Noise Estimator (127) is too high to realize a real- 
time communication between the ARMS and the Phase 
Correction process (125). In addition an estimation of 
the number of cycles required to execute each function 
on the ARMS shows that the implementation of Con-e- 
latoi* Noise Estimator process (127) in soJtware leaves 
insufficient time to perform tracking and acquisition in 
between every two symbols. 

Alter merging, each oJ the merged processes can 
now be implemented on a separate target processor by 
the appropriate compiler. The communicaiion between 
the merged processes is still done via primitive ports and 
channels- 
Communication WIechenUm Selection 

After the partitioning ot the system has been verified 
by simulation and before the actual implementation 
takes place, the designer may choose to refine the com- 
municatk)n mechanism between the processors. This 
can be achieved by making explicit the behavrar of the 
channels between the processors. 

In the running example, the processors can, in prin- 
ciple, operate concurrently because each processor has 
its own thread of control. By rerining the RPC basecl 
communicatranschemewe can pipeline thB processors: 
all processors operate concurrenUy and ai I/O points 
they synchronize. This refined communicaiion scheme 
is called BIocked/UnBkxiked Read/Write corrwnunica- 
llon. FIGURE l3showsthe pager with the refined com- 
munication mechanism. The inputs and outputs of the 
processors have been labeled with BW tor Blocked 
Write, Brt lor Blocked Read, and UBR for UnBlocked 
Read. 

BW-BR communication guarantees that no data is 
ever lost. When the writing process has data available, 
it will signal that to the reading process. If the reading 
prt3cess is at that moment not ready to receive the data 
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(because it is still processing the previous data), the writ- 
ing process will block until the reading process is ready 
to communicate. Alternatively, if the reading process 
needs new data, it will signal that to the writing process. 

s If the writing process is at mat moment not ready to send 
the data (because it is still computing the data), the read- 
ing process will btock until the writbig process is ready 
to communicate. The BW-BR scheme is used in the 
main signal path. 

10 A BW-BR scheme, however, is not used for the pa- 
rameter and mode setting tor the main signal path. If an 
accelerator uses BR to read a parameter value it will be 
blocked until the parameter is provided. Since the pa- 
rameter setting is done in software, this will slow down 

15 the computations in the main signal path considerably. 
Therefore parameter setting is done via a BW-UBR 
scheme. This makes sure that every parameter change 
is read by the aocelerators, but it leaves it up to the ac- 
celerator to decide When to read the parameter. 

20 In the CoWare design environment the refinement 
of the communlcalbn mechanism Is performed by mak- 
ing use of a hierarch rcal channel. A hierarchical channel 
replaces a primitive channel by a process that describes 
how communicaiion over that channel is carried out. 

S5 The introduction of BW-BR communication is 
shown m detail for the Chip latched Filler & Phase Cor- 
rection (1 35) and Con'elatof & Noise Estimator process 
(136) in RGURE 14. 

The BWBR channel (137) contains a autonomous 

30 thread (1 33) and a slave thread (i 39) that communicaie 
with each other via the shared variable tmp(0„7] in the 
context (1 40). The slave thread (1 39) is activated by an 
RPC (141) trom the CMF & Phase Conection process 

(135) and it tries to update the context <iaO) with new 
35 values. The autonomous thread (138) continuously tries 

to read the values from the context (1 40) and send them 
to the output port (144) an RPC (142), in this way 
activating the Correlation & Noise Estirrtator process 

(1 36) that is attached to that output (1 44). The blocking 
40 character of the communication is taken care of by the 

use of a binary semaphore rw (143). Tbis guarantees 
that the input thread (139) will bkx:k until the prevkaus 
data has been read by the autonorrxxis thread (1 38) (no 
data is overwrftten before it has been read), and that the 

-»5 autonomous thread (136) will bkjck until new data is 
available (no data Is read twice). When the input slave 
thread (139) is blocked, the CMF & Phase Coffection 
process (135) that requested its ser^e via an RPC 
(141) is also blocked because the RPC (141) will only 

so return after the slave thread (1 39) has completed. When 
the autonomous thread (138) is blocked, there are rw 
RPC requests to the Correlator Noise Estimator process 
(1 36), so that process (1 36) is blocked auiomaticaUy. 
In the caee Of Blocked-Write. UnBlocked Road com- 

5S munication, the code tor the autonomous thread is 
slightly modified. The thread always- sends the value 
stored in the context, without checking whether it is up- 
dated. The sarne value can be sent more than once, but 
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the thread will never be bkxked- The input slave thread 
is identical lo the BWBB case, and will block until tha 
data has been read. 

In both cases, locking and unlocking ol the context 
is required to avoid concurrent accesses to the shared 
variable in the context and, as siich. has nothing to do 
with the bhDCking character of the communication. 

Implementation of the pager 

After the newly introduced communication mecha^ 
nism has been verified by simulation, each process has 
to be synthesized on its assigned target processor. 

Implementatbn of a Process in Hardware 

FIGURE 15 illustrates The pure hardware innple- 
mentation for the Correlator & Noise Estimator process 
(145) and the merged Phase Correction and Chip 
Matched Filler process (146). 

This hardware fcnnplementalion for the pager con- 
sists of three distinct steps: 

The (merged) OFL processes are synthesized by 
the CATHEDRAL silicon compiler. The compiler gener- 
ates processors of which all the inputs and output© are 
of the master type. These processors are shown in FIG- 
URE 15 as the inner rectangles (147, 14S). 

£ach processor is encapsulated to make it consist- 
ent with the specification in which the DFL processes 
have slave inputs, fn addition, the encapsulation in- 
cludes ckx:k gating circuitry to control the activity ot the 
processor. The encapsulated processors are shown in 
FlisURE 15 as the big rectangles (149, 150) : they in- 
clude the processor generated by CATHEDRAL (147, 
l48)andsome encapsulation hardware (151, 152, 153). 
As can be obsenred the input ports (154, 155) of the 
encapsulated processors (149, 150) are now of the 
slave type. The er^capsulation hardware (151, 152, i53) 
is shown in detail in FIGURE 16 as the blocks (155, 156, 
157). 

The BWBR process is implemented in hardware. In 
this case we obtain the gale-level implernentatkin ot this 
process from the Itbraiy. This rmplementalion is func- 
tionally equivalent to the original C-liK© description (1 37) 
of this bteck in FIGURE 14. RGURE 16 shows the de- 
tailed implementation (158) ot the BWBP process that 
is used in the main signal path of the pager. 

Implemenlatjon of a Process in Software 

To simplify the discussion we will only look at the 
transfer of the 14 correfetron values to the Tracking Ac- 
quishion process, and the setting of a parameter value 
by the User Interface. We also know that the transfer of 
the correlation values has to be BWSn and the transfer 
of the parameter value has to be BWUBR. 

The hardware interlace and the software I/O device 
driver is generated automatically with SYMPHONY. To 



30 

generate these interfaces SYt^PHONY analyses the 
ports of the software process. For each of these ports. 
SYMPHONY scans the library ol I/O scenarios for an 
^plicable scenario. The user is asked to select the most 
s appropriate scenario amongst the applicable ones. 
SYMPHONY then combines the selected scenarios into 
a software I/O device driver and a hardware interlace. 

In the example (FIGURE 1 7) there are two ports to 
be inapiemented: 

10 

bool[32] 114] corr; inslave (159) is used to transfer 
the correlation values. This is a port of type inslave, 
that transports an an^y of 14 bit vectors of size 32. 
- bool par. outnnaster (1 60) is used to set a parameter 
IS of the correlation block. This is a port of type out- 
master, that transports a boolean value. 

For the con* port (159) SYMPHONY proposes the 
scenario depicted in FIGURE 17. The memory port ot 

so the ARM will be used to transfer the correlation values 
and the FlQ pon of the processor will be used to initiate 
the transfer. The I/O scenario describes what blocks 
naed to be inserted in software and hardware to realize 
this kind of communication. In total three hardware and 

2S thros software blocks are required to implement the 
communicatkjn over the corr port. Unpack (l6l). Tlie 
memory port of the ARM is obviously not wide enough 
to transfer the 14 correlatk>n values in parallel There- 
lore, the scenario will sequentializo the transfer. Of the 

30 14 correlation values 13 will be stored internally in the 
Unpack block (161). The 14th value is sent to the Split 
block (162). 

Split (1 62) stores the 14th value intemalty and than 
activates the FlQ port of the ARM processor Activating 

35 the FIO port (163) of the hardvirara model (164), has as 
a consequence that an RPC is issued on the interrupt 
port (155) of the software model (166).. This port (165) 
is connected with the Join (167) block. 

Join (167) retrieves the i4th corretatton value by ts- 

^ suing an RPC to the corresponding input port (168) Of 
the Demux block (169). Demux (169). The data transfer 
is implemented through memory mapped I/O. There- 
fore, when selecting this I/O scenario, the user shoukj 
decide on tne aodress that will be used for the transfer 

45 When one ot the input ports of the DemuX block is acti- 
vated an RPC to the memory port (170) will be per- 
formed with an address that con-esponds to Oie activat- 
ed inpul port. 

Mux (171). At the hardware side the menxjry port 

so (172) issues an RPC to the Mux (171) btock whenever 
it (172) is activated. In that block (171), the address will 
be decoded and the con-esponding output port virill be 
activated to retrieve the correlation value that was 
stored locally in hardware (either in the Split (162) or in 

55 the Unpack bkx;k (161)) Pack (i 73). After the 14th cor- 
relatk>n value has been retrieved by the Join block (1 67), 
it is passed on to the Pack block (173), that will then 
retrieve the 13 other correlation values by issuing con- 
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secinive RPCs to the different porta of thQ Mux block 
(169], Finally, whsn the Packbtock (173) has relrievGd 
all 1 4 values, it packs Ihem in an array and activates the 
original software application code in the Tracking & Ac- 
quisition process (174). 

All these blocks are described in a generic way in a 
library where they can be retrieved and customized by 
SYMPHONY. 

The solutton for the par port (160) is much simpler. 
Since it is an cutmastor port it can directly be mapped 
on the n^emory port. However, since the memory port is 
already used, an extra multiplexer (175) is required. This 
is shown in FIGURE 17, To implement the unblocking 
read character ot the transfer an extra register (i76) is 
required on the hanSware skje. 

Before going on wttn the implementation path, all 
processes thai were added by SYMPHONY are 
merged. The hardware interfaces are merged into one 
hardware interface block that can then be Implemented 
wHh RT-»evei synthesis tools. The I/O device driver proc- 
esses aic merged with the original SW application code. 
As a conso<»uerw:e of the in-fining, the complete tracking 
and acquistlton stevc thread moves in the interrupt rou- 
tine. Whenever new correlation values are ready, the 
main software thread is interrupted to run the tracking 
and acquisition algorithm. After that interrupt is pux- 
essed, the n^in thread resumes. 

In the above description a design environment and 
a design methodology meeting the requirements of 
modularity, encapsulation of different description lan- 
guages, modeling from a heterogeneous conceptual 
specification to a resulting heterogeneous archKecture 
and all refinement steps inbetween, rr>odeling capabili- 
ties lor off-the-shelf components and the associated de- 
sign environrr^enls, reparation between functional and 
communication behavior and processor independent in- 
terface synthesis have been disclosed. Yet it is apparent 
that other embodiments of the present invention may be 
obvious to the person skilled in the art» the spirit and 
scope of the present invention being limited only by the 
terms of the appended claims. 



Claims 

1, A design environment for implementing an hetero- 
geneous essentially digital system, comprising: 

a database compiled on a corrrputer, adapted 
for access by executable programs on said 
computer for generating the implementation of 
said heterogeneous essentially digital system, 
comprising a pluraitty of objects representing 
aspects of said digital system wherein said ob- 
jects comprise primitive objectfi representtng 
the specification of said digital system arKi hi- 
erarchical objects being created by said exe- 
cutable programs while generating the imple- 



nnentaiion of said digital system, said hierarchi- 
cal objects being refinements of said primitive 
objects and having more detail and preserving 
any one or all of sakl aspects to thereby gen- 

5 erate said implementation of said di^l sys- 

tem; and further comprising relations inbe- 
tween sakJ primitive objects and inbetween 
said hierarchical objects and between said 
primitive objects and said hierarchical objects ; 

10 and further comprising functions for manipulat- 

ing said objects and said relations; 
means for specifying said heterogeneous dig- 
ital system comprising a plurality ot behavioral 
and structural languages: 

75 means for simulating said heterogeneous dig- 

ital system comprising a plurality of simulators 
tor said behavioral and structural languages; 
means for implementing said heterogeneous 
digital system comprising a plurality ot compil- 

£0 ers (or sakJ behavioral and structural languag- 

es; 

means for allocating hardware components for 
an implementalwn of said heterogeneoi/s dig- 
ital system; 

25 means for assigning hardware subsysterr^s and 

software subsystenns of said heterogeneous 
digital system to said hardware components: 
means for Implementing the communication 
between said software subsystems and said 

3C hardware subsystems, one of the aspects of 

said communication being represented by 
ports; 

means lor encapsulating said simulators, said 
compilers, said hardware components, said 

35 hardware subsystems and said software sub- 

systems whereby creating a consistent com- 
munication between said encapsulated simula- 
tors, compilers, ftardware components, hard- 
ware subsystems arK* software subsystenvs: 

40 and 

means for creating processor models of said 
hardware components as objects in said data- 
base, said models comprising software models 
representing the software views on said hard- 

^ ware componerrts and hardware nr»wels repre- 

senting the hardware views on said hardware 
components. 

2, The design environment as recited in claim 1 further 
50 comprising means for creating I/O scenario models 

of said ports as objects in said database, said I/O 
scenario riKxilels representing the implementation 
of said ports on said hardware components, sakj 
implementation comprising software subsystems, 
ss hardware subsystems. arKJ processor models with 
connections therebetween. 

3. The design environment as recited in claim 2 
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wherein the Implementation of the commLtnication 
between a first software subsystem and a first hard- 
ware sLib$ystem resuHs in said first software sub- 
system with a first port being replaced by a second 
hardware subsystem with a second port, said first 
port and said second port rapresenlinQ an essen- 
tially identical communtcation. 

4. The design environment as recited in claim 3turther 
comprising: 

means for selecting I/O scenario models lor the 
ports of said first software subsystem; 
rrteans for combining the software subsystems 
of said selected I/O scenarios; 
nnearvsforcombining the hardware subsystems 
of said selected t/O scenarios. 

5. The design environment as recited in claim 4 
wherein a first I/O scenario model represents the 
connection ot said first port to said second port, said 
connection comprising a connection ol said first port 
to said software subsysterrw of said \/0 scenario 
model, connections of said software subsystems of 
said I/O scenario modol to said software model, 
connections of said hardvtrare modol to said hard- 
ware subsystems of said I/O sceriario model, and a 
connection ol said hardware subsystems of said 1/ 
O scenario nrKxtel to said second pon. 

6. The design envrronment as redted in claim 5 
wherein I/O scer\ario models comprise memory 
mapped I/O scenarios, instruction programmed I/O 
scenarios, interrupt based I/O scenarios. 

7- The design envinsnment as recited in claim n 
wherein said implementation ts a simulation of said 
digital system. 

8. The design environment as recited in claim 7 
wherein said simulation is a mufti-platfonm simula- 
tion "being executed on a plurality of computers. 

9. The design environment as recited in claim 8 
wherein said simulation is a hybrid simulation com- 
prising substantially simultaneous hardware tmple- 
meniations ano computer simulations. 

10. The design environment as recited in claim i 
wherein said implementation is a heterogeneous 
implementation comprising hardware subsystems 
and software subsystems, said software subsys- 
tems being escecutcd on one or more of said hard- 
ware subsystems. 

11. The design environment as recited in claim 10 
wherein said hardware subsystems comprise any 
one or more of processor cores, off-the-shelf com- 
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poncnts. custom components, ASICs, processors, 
or boards. 

12. The design environment as recited in claim 10 
wherein said aspects are functional or communica- 
tion or concurrency or structural aspects oT said dig- 
ital system. 

13. A method of making an implementation of a heter- 
ogeneous essentially digital system, said imple- 
mentation comprising hardware and software sub- 
systems ot said system, «aid software subsystem 
being executed on one or more of said hardware 
subsystems, comprising the steps of: 

defining a lirsa set of primitive objects repre- 
senting the specification of said digital system, 
comprising the steps of: 
describing the specification of said system in 
one or more processes, eacn process repre- 
senting a f unclior^al aspect of said system, said 
pn^cesses being primitive objects; 
defining ports as\d connecting said ports vrith 
channels, said ports structuring the communi- 
cation between said processes, said ports and 
said channels being primitive objects, one proc- 
ess having one or more ports; 
defining the communication semantics of said 
ports by a protocol, said protocol being a prim- 
itive object; 
and thereafter 

creating hierarchicalobjects being refinements 
ol said prHTirtrve objects and having more detail, 
while preserving aspects of said communica- 
tion semantics: 

allocating one or more hardvvare components, 
said components comprising programmable 
processors and non-programmable proces- 
sors; 

assigning said processes to said hardware 
components, the processes being assigned to 
a programmable processor being a software 
subsystem, the other processes being hard- 
ware subsystems. 

selecting i/O scenario modete for the ports ot 
said software subsystem Thereby connecting 
said pons to the intenace erf said programmable 
processor pnd connecting the inlertace of said 
programmable processor to second ports, said 
second ports representing an essentially iden- 
tioal communication as said ports. 

14, The method as recited in claim 1 3 further compris- 
ing the stop of simulating said system, 

15. The method as recited in claim 13 wherein said 
hardware subsystems comprise any one or more of 
procesaor cores, off-the-shelf components, custom 
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cofTiponents, ASICs, processors, or boards. 

16, The rPGthod 3s recited in claim 15 further compris- 
ing the step dt refining the channel inbetween afirst 
and a second port of respectively a first and a sec- 5 
ond hardware componGnt, said first and said sec- 
ond port having an incompatible protocol, thereby 
creating a hierarchical channel, said hierarchical 
channel converting the first protcxsol into the second 
protocol 

17, The method as recited in claim 16 lurther compris- 
ing the step of refining the channels inbetween in- 
compatible ports of hardware components, thereby 
creating hierarchical channels. 

18, The method as recited in claim 17 further compris- 
ing the step of generating a neilist comprising the 
layout infonnation of said Implementation. 
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